2024-12-12 14:14:40.AIbase.13.9k
Harvard University to Release Massive Free AI Training Dataset Funded by OpenAI and Microsoft
Harvard University announced on Thursday the release of a high-quality dataset containing nearly one million public domain books, which anyone can use to train large language models and other AI tools. This dataset was created by Harvard's newly established Institutional Data Initiative and is funded by Microsoft and OpenAI. The included books are all works scanned by the Google Books Project that are no longer under copyright protection.